Fast optimal leaf ordering for hierarchical clustering

نویسندگان

  • Ziv Bar-Joseph
  • David K. Gifford
  • Tommi S. Jaakkola
چکیده

We present the first practical algorithm for the optimal linear leaf ordering of trees that are generated by hierarchical clustering. Hierarchical clustering has been extensively used to analyze gene expression data, and we show how optimal leaf ordering can reveal biological structure that is not observed with an existing heuristic ordering method. For a tree with n leaves, there are 2(n-1) linear orderings consistent with the structure of the tree. Our optimal leaf ordering algorithm runs in time O(n(4)), and we present further improvements that make the running time of our algorithm practical.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering, Leaf-ordering and Visualization for Intuitive Analysis of Deoxyribonucleic-Acid Chip Data

Generally the result data from DNA chip experiments have lots of gene expression information. Scientists want to get perspective insight or want to find intuitive fact from that data. Hierarchical clustering is the most widely used method for analysis of gene expression data. In this paper, we address leaf-ordering, which is a post-processing for the dendrograms – a sort of edge-weighted binary...

متن کامل

K-ary Clustering with Optimal Leaf Ordering for Gene Expression Data

MOTIVATION A major challenge in gene expression analysis is effective data organization and visualization. One of the most popular tools for this task is hierarchical clustering. Hierarchical clustering allows a user to view relationships in scales ranging from single genes to large sets of genes, while at the same time providing a global view of the expression data. However, hierarchical clust...

متن کامل

Repeated Record Ordering for Constrained Size Clustering

One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...

متن کامل

dendsort: modular leaf ordering methods for dendrogram representations in R

Dendrograms are graphical representations of binary tree structures resulting from agglomerative hierarchical clustering. In Life Science, a cluster heat map is a widely accepted visualization technique that utilizes the leaf order of a dendrogram to reorder the rows and columns of the data table. The derived linear order is more meaningful than a random order, because it groups similar items t...

متن کامل

Optimal Arrangement of Leaves in the Tree Representing Hierarchical Clustering of Gene Expression Data

In this paper, we study how to present gene expression data to display similarities by trying to find a linear ordering of genes such that genes with similar expression profiles will be close in this ordering. In general, finding the best possible order is intractable, and furthermore an unrestricted ordering may not be desired. Therefore we concentrate on the case in which hierarchical cluster...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 17 Suppl 1  شماره 

صفحات  -

تاریخ انتشار 2001